OMI4papps: Optimisation, Modelling and Implementation for Highly Parallel Applications
نویسندگان
چکیده
This article reports on first results of the KONWIHR-II project OMI4papps at the Leibniz Supercomputing Centre (LRZ). The first part describes Apex-MAP, a tunable synthetic benchmark designed to simulate the performance of typical scientific applications. Apex-MAP mimics common memory access patterns and different computational intensity of scientific codes. An approach for modelling LRZ’s application mix is given which makes use of performance counter measurements of real applications running on ”HLRB II”, an SGI Altix system based on 9728 Intel Montecito dual-cores. The second part will show how the Apex-MAP benchmark could be used to simulate the performance of two mathematical kernels frequently used in scientific applications: a dense matrix-matrix multiplication and a sparse matrix-vector multiplication. The performance of both kernels has been intensively studied on x86 cores and hardware accelerators. We will compare the predicted performance with measured data to validate our Apex-MAP approach. 1 Performance Modelling Using the Apex-MAP Benchmark A simple synthetic benchmark with tunable hardware independent parameters that mimics the behaviour of typical scientific applications is very useful for the evaluation of new hardware platforms for a certain job mix. Mapping application performance data measured on a production system to specific parameter combinations of the synthetic benchmark allows to model the performance of a wide spectrum of applications with a simple approach. Volker Weinberg · Matthias Brehm · Iris Christadler Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften, Boltzmannstr. 1, 85748 Garching bei München, Germany e-mail: {volker.weinberg, matthias.brehm,iris.christadler}@lrz.de
منابع مشابه
Emergency department resource optimisation for improved performance: a review
Emergency departments (EDs) have been becoming increasingly congested due to the combined impacts of growing demand, access block and increased clinical capability of the EDs. This congestion has known to have adverse impacts on the performance of the healthcare services. Attempts to overcome with this challenge have focussed largely on the demand management and the application of system wide p...
متن کاملParleda: a Library for Parallel Processing in Computational Geometry Applications
ParLeda is a software library that provides the basic primitives needed for parallel implementation of computational geometry applications. It can also be used in implementing a parallel application that uses geometric data structures. The parallel model that we use is based on a new heterogeneous parallel model named HBSP, which is based on BSP and is introduced here. ParLeda uses two main lib...
متن کاملImplementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)
Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...
متن کاملParallel implementation of underwater acoustic wave propagation using beamtracing method on graphical processing unit
The mathematical modeling of the acoustic wave propagation in seawater is the basis for realizing goals such as, underwater communication, seabed mapping, advanced fishing, oil and gas exploration, marine meteorology, positioning and explore the unknown targets within the water. However, due to the existence of various physical phenomena in the water environment and the various conditions gover...
متن کاملA Novel and Efficient Hardware Implementation of Scalar Point Multiplier
A new and highly efficient architecture for elliptic curve scalar point multiplication is presented. To achieve the maximum architectural and timing improvements we have reorganized and reordered the critical path of the Lopez-Dahab scalar point multiplication architecture such that logic structures are implemented in parallel and operations in the critical path are diverted to noncritical path...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1001.1860 شماره
صفحات -
تاریخ انتشار 2010